Search CORE

225 research outputs found

Streaming Similarity Self-Join

Author: Gionis Aristides
Morales Gianmarco De Francisci
Publication venue
Publication date: 01/01/2016
Field of study

We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose similarity is greater than a given threshold. The simplest formulation of the problem requires unbounded memory, and thus, it is intractable. To make the problem feasible, we introduce the notion of time-dependent similarity: the similarity of two items decreases with the difference in their arrival time. By leveraging the properties of this time-dependent similarity function, we design two algorithmic frameworks to solve the sssj problem. The first one, MiniBatch (MB), uses existing index-based filtering techniques for the static version of the problem, and combines them in a pipeline. The second framework, Streaming (STR), adds time filtering to the existing indexes, and integrates new time-based bounds deeply in the working of the algorithms. We also introduce a new indexing technique (L2), which is based on an existing state-of-the-art indexing technique (L2AP), but is optimized for the streaming case. Extensive experiments show that the STR algorithm, when instantiated with the L2 index, is the most scalable option across a wide array of datasets and parameters

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Absorbing random-walk centrality: Theory and algorithms

Author: Gionis Aristides
Mathioudakis Michael
Mavroforakis Charalampos
Publication venue
Publication date: 08/09/2015
Field of study

We study a new notion of graph centrality based on absorbing random walks. Given a graph

G=(V,E)

and a set of query nodes

Q\subseteq V

, we aim to identify the

k

most central nodes in

G

with respect to

Q

. Specifically, we consider central nodes to be absorbing for random walks that start at the query nodes

Q

. The goal is to find the set of

k

central nodes that minimizes the expected length of a random walk until absorption. The proposed measure, which we call

k

absorbing random-walk centrality, favors diverse sets, as it is beneficial to place the

k

absorbing nodes in different parts of the graph so as to "intercept" random walks that start from different query nodes. Although similar problem definitions have been considered in the literature, e.g., in information-retrieval settings where the goal is to diversify web-search results, in this paper we study the problem formally and prove some of its properties. We show that the problem is NP-hard, while the objective function is monotone and supermodular, implying that a greedy algorithm provides solutions with an approximation guarantee. On the other hand, the greedy algorithm involves expensive matrix operations that make it prohibitive to employ on large datasets. To confront this challenge, we develop more efficient algorithms based on spectral clustering and on personalized PageRank.Comment: 11 pages, 11 figures, short paper to appear at ICDM 201

arXiv.org e-Print Archive

Crossref

Community-aware network sparsification

Author: Gionis Aristides
Rozenshtein Polina
Tatti Nikolaj
Terzi Evimaria
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 25/01/2017
Field of study

Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges. In this paper we consider a novel formulation of the network-sparsification problem. In addition to the network, we also consider as input a set of communities. The goal is to sparsify the network so as to preserve the network structure with respect to the given communities. We introduce two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties. From the technical point of view, we prove hardness results and devise effective approximation algorithms. Our experimental results on a large collection of datasets demonstrate the effectiveness of our algorithms.https://epubs.siam.org/doi/10.1137/1.9781611974973.48Accepted manuscrip

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

A Motif-based Approach for Identifying Controversy

Author: Coletto Mauro
Garimella Kiran
Gionis Aristides
Lucchese Claudio
Publication venue
Publication date: 01/01/2017
Field of study

Among the topics discussed in Social Media, some lead to controversy. A number of recent studies have focused on the problem of identifying controversy in social media mostly based on the analysis of textual content or rely on global network structure. Such approaches have strong limitations due to the difficulty of understanding natural language, and of investigating the global network structure. In this work we show that it is possible to detect controversy in social media by exploiting network motifs, i.e., local patterns of user interaction. The proposed approach allows for a language-independent and fine- grained and efficient-to-compute analysis of user discussions and their evolution over time. The supervised model exploiting motif patterns can achieve 85% accuracy, with an improvement of 7% compared to baseline structural, propagation-based and temporal network features

arXiv.org e-Print Archive

Archivio Ricerca Ca'Foscari

Aaltodoc Publication Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Association for the Advancement of Artificial Intelligence: AAAI Publications